Search CORE

6 research outputs found

Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers

Author: Cordoba Herralde Ricardo de
D'Haro Enriquez Luis Fernando
Ferreiros López Javier
Salamea Palacios Christian Raúl
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition

Archivo Digital UPM

Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition

Author: Cordoba Herralde Ricardo de
D'haro Enríquez Luis Fernando
Echeverry Correa Julian David
Salamea Palacios Christian Raúl
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg

Crossref

Archivo Digital UPM

Incorporación de n-gramas discriminativos para mejorar un reconocedor de idioma fonotáctico basado en i-vectores

Author: Caraballo Morcillo Miguel Ángel
Córdoba Herralde Ricardo de
D'haro Enríquez Luis Fernando
Salamea Palacios Christian Raúl
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2013
Field of study

Este artículo describe una nueva técnica que permite combinar la información de dos sistemas fonotácticos distintos con el objetivo de mejorar los resultados de un sistema de reconocimiento automático de idioma. El primer sistema se basa en la creación de cuentas de posteriorgramas utilizadas para la generación de i-vectores, y el segundo es una variante del primero que tiene en cuenta los n-gramas más discriminativos en función de su ocurrencia en un idioma frente a todos los demás. La técnica propuesta permite obtener una mejora relativa de 8.63% en Cavg sobre los datos de evaluación utilizados para la competición ALBAYZIN 2012 LRE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

On the use of phone-gram units in recurrent neural networks for language identification

Author: Cordoba Herralde Ricardo de
D'Haro Enríquez Luis Fernando
Salamea Palacios Christian Raúl
San Segundo Hernández Rubén
Publication venue: 'International Speech Communication Association'
Publication date: 01/06/2016
Field of study

In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phonegrams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system

Archivo Digital UPM

Diseño y evaluación de técnicas de reconocimiento de idioma mediante la fusión de información fonotáctica y acústica

Author: Salamea Palacios Christian Raúl
Publication venue: 'Universidad Politecnica de Madrid - University Library'
Publication date: 01/01/2018
Field of study

La aplicación de técnicas fonotácticas en los sistemas de reconocimiento de idioma ha venido siendo un ámbito de continuo estudio ya que su correcta utilización deriva en importantes mejoras en el rendimiento de dichos sistemas. La forma en la que se desarrolla un idioma así como el conjunto de características fonéticas que se generan con el habla son elementos claves en las tareas de identificación de idioma (LID). La eficiencia con la que se logran capturar dichas características fonéticas es un factor determinante para la obtención de un reconocedor de calidad. Aunque los sistemas actuales han alcanzado una tasa de acierto muy razonable, continúan teniendo problemas, como por ejemplo la cantidad de recursos informáticos requeridos para el procesamiento de la información y por otro lado la cantidad de información de entrenamiento necesaria para que los sistemas automáticos puedan incorporar adecuadamente información característica de los idiomas a reconocer. Las redes neuronales profundas y particularmente las recurrentes, han resultado eficientes para modelar las características fonéticas de los idiomas y por tanto, se están utilizando con este fin para varios tipos de tareas en el reconocimiento de habla y en tareas de LID. Los modelos de lenguaje se generan a dos niveles, uno a nivel léxico y otro a nivel fonético. En esta tesis se ha decidido utilizar un sistema fonotáctico que es capaz de aprovechar una mayor información de contexto y para ello, se han utilizado unidades fonéticas que buscan incorporar las características fonotácticas de idioma, además de incorporar más información de contexto de la que ofrece un fonema. En esta tesis se explora el uso de estas unidades fonéticas ngramas-fonéticos en tareas LID, identificando valores óptimos de configuración y respuestas mediante las diferentes técnicas propuestas, todo ello en el contexto de la creación de modelos de lenguaje basados en redes neuronales recurrentes. Por otra parte, en el mismo ámbito fonotáctico, se introduce la idea de utilizar la representación vectorial de ngramas-fonéticos en tareas LID, dejando de lado el concepto de modelo de lenguaje que se basa en información del pasado para predecir nueva información y dando paso a la generación de modelos basados en el contexto y en los ngrama-fonéticos objetivo. También se han estudiado en esta tesis los sistemas fonotácticos para tareas LID basados en estructuras de i-Vectores. El uso de información discriminativa y de coeficientes PLLR han permitido explorar nuevas alternativas en la tarea LID. En base a ello, se han estudiado alternativas para ampliar el contexto que tienen en cuenta dichos coeficientes para mejorar su rendimiento. Todos los estudios propuestos han sido realizados sobre la base de datos KALAKA-3 utilizada en la evaluación ALBAYZIN-LRE2012 en razón del equilibrio encontrado entre su tamaño y la dificultad de la tarea definida en cuanto a su tiempo de ejecución. ----------ABSTRACT---------- The application of phonotactic techniques in language recognition systems has always been an area of special interest since, if correctly used, it leads to significant improvements in the performance of recognition systems. The acoustic realization of the language and its phonetic characteristics are the key elements for the language recognition task (LID). The efficiency obtained with these phonetic characteristics determines the quality of the recognizer. Although nowadays the efficiency of the recognizers is very high, there are still several problems remaining, e.g., they use high computational resources to process the information and, also, the training data is always not enough to incorporate all the characteristics specific of a language. Deep Neural Networks and especially the recurrent ones, have proved to be efficient to model the phonetic characteristics of the languages and, so, they are being used for several tasks in speech recognition and speaker/language identification. Language models are generated in two levels, either a lexical level or a phonetic level. In this thesis, we have decided to use a phonotactic system that is able to manage a larger context information and, to that end, we propose the use of the phonetic ngram, that tries to incorporate the phonotactic characteristics of the languages, together with the context information that phonemes alone do not provide. In this thesis, we explore the use of these phonetic features in LID tasks, finding optimum values for the configuration parameters and presenting different techniques, all of them related in the creation of language models based on recurrent neural networks. On the other hand, using also a phonotactic approach, we introduce the idea of using the vector representation of phonetic ngrams for LID tasks, as an alternative to the language model based on RNN, to create models based in the context and the objective phonetic ngrams. We have also studied in this thesis the phonotactic systems based on i-Vectors for LID tasks. The use of discriminative information and the PLLR coefficients have offered us new alternatives in the LID task. We have proposed alternatives to increase the context considered in these parameters to improve its performance. All of these studies have been applied to the KALAKA-3 database used in the ALBAYZINLRE2012 evaluation, because of the good compromise between the size of the database and the task difficulty in relation with its execution time

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Calibración automática en filtros adaptativos para el procesamiento de señales EMG

Author: Luna Romero Santiago
Salamea Palacios Christian Raúl
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2011
Field of study

En este trabajo se propone un filtrado adaptativo que incluye una etapa de calibración automática para la adquisición de señales EMG (Electromiografía). Se propone una técnica innovadora llamada “calibración automática” para minimizar el ruido provocado por el contacto de la piel con los sensores utilizados (electrodos) en condiciones de actividad física. Se utiliza filtrado adaptativo considerando que tanto la actividad física como la sudoración en personas son factores que alteran las condiciones de medición. La experimentación se ha realizado con personas que han desarrollado actividad física con diferentes condiciones de esfuerzo. Se ha utilizado la mejora relativa de la relación señal-ruido (RI-SNR) para comparar la técnica propuesta con filtros adaptativos que usan “ruido blanco” como señal de referencia. Este trabajo estuvo enfocado en los estimadores: Wiener, LMS y RLS, con mediciones realizadas antes y después de la actividad física. La técnica propuesta presenta una mejora de hasta un 45,49%, comparada con la correspondiente que utiliza “ruido blanco” para la calibración. ---------- ABSTRACT---------- In this work, an adaptive filtering that includes an automatic calibration process to acquire EMG (electromyography) signals has been implemented. We propose a novel technique called “autocalibration” to minimize the noise generated by the contact of the skin with sensors used (electrodes) during physical activities development. Adaptive filtering has been used considering both, physical activity and sweating in persons are factors that could change the measurement conditions. To evaluate the proposed technique, a group of persons have been selected to develop physical activities for different intensities of effort. Relative improvement of the signal to noise ratio (RI-SNR) has been used to compare both, the proposed technique and adaptive filters that use “white noise” as reference signal. This work is focused on Wiener, LMS and RLS estimators, with measurements performed before and after of the physical activities. Applying the autocalibration process in adaptive filtering, an improvement up to 45,49% compared with the corresponding that uses “white noise” for calibration has been obtained

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM